Video description: A comprehensive survey of deep learning approaches
نویسندگان
چکیده
Abstract Video description refers to understanding visual content and transforming that acquired into automatic textual narration. It bridges the key AI fields of computer vision natural language processing in conjunction with real-time practical applications. Deep learning-based approaches employed for video have demonstrated enhanced results compared conventional approaches. The current literature lacks a thorough interpretation recently developed sequence techniques description. This paper fills gap by focusing mainly on deep learning-enabled caption generation. Sequence models follow an Encoder–Decoder architecture employing specific composition CNN, RNN, or variants LSTM GRU as encoder decoder block. standard-architecture can be fused attention mechanism focus distinctiveness, achieving high quality results. Reinforcement learning within structure progressively deliver state-of-the-art captions following exploration exploitation strategies. transformer is modern efficient transductive robust output. Free from recurrence, solely based self-attention, it allows parallelization along training massive amount data. fully utilize available GPUs most NLP tasks. Recently, emergence several versions transformers, long term dependency handling not issue anymore researchers engaged summarization description, autonomous-vehicle, surveillance, instructional purposes. They get auspicious directions this research.
منابع مشابه
The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches
Deep learning has demonstrated tremendous success in variety of application domains in the past few years. This new field of machine learning has been growing rapidly and applied in most of the application domains with some new modalities of applications, which helps to open new opportunity. There are different methods have been proposed on different category of learning approaches, which inclu...
متن کاملA Comprehensive Survey of Data Processing Approaches
In Wireless Sensor Networks (WSNs), energy of the sensor nodes are the most concerning factor because each sensor is equipped with limited amount of energy which is used to perform lots of work. Sensors have capabilities of sensing, analyzing, processing and communication of the data. In WSNs, the most responsible factor behind energy consumption of the sensor nodes is Data Processing which mai...
متن کاملA Comprehensive Survey of Video Encryption Algorithms
Encryption is the widely used technique to offer security for video communication and considerable numbers of video encryption algorithms have been proposed. The paper explores the literature for already proposed video encryption algorithms with the focus on the working principle of already proposed video encryption schemes. This study is aimed to give readers a quick overview about various vid...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Artificial Intelligence Review
سال: 2023
ISSN: ['0269-2821', '1573-7462']
DOI: https://doi.org/10.1007/s10462-023-10414-6